IMAGE DENOISING: LEARNING NOISE DISTRIBUTION VIA 
PDE-CONSTRAINED OPTIMIZATION 



JUAN CARLOS DE LOS REYES* AND CAROLA-BIBIANE SCHONLIEBt 

Abstract. Wc propose a PDE-constrained optimization approach for the determination of noise 
distribution in total variation (TV) image denoising. An optimization problem for the determination 
of the weights correspondent to different types of noise distributions is stated and existence of an 
optimal solution is proved. A tailored regularization approach for the approximation of the optimal 
parameter values is proposed thereafter and its consistency studied. Additionally, the differentiability 
of the solution operator is proved and an optimality system characterizing the optimal solutions of 
each regularized problem is derived. The optimal parameter values are numerically computed by 
using a quasi-Newton method, together with semismooth Newton type algorithms for the solution 
of the TV-subproblems. 
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1. Introduction. Let / € L p (f2), p = 1 or 2 with fi C R 2 , be a given noisy 
image. Depending on the application at hand the type of noise, i.e., the noise dis- 
tribution, changes [5]. Examples for noise distributions are Gaussian noise, which 
typically appears in, e.g. MM (Magnetic Resonance Tomography), Poisson noise 
in, e.g. radar measurements or PET (Positron Emission Tomography), and impulse 
noise usually due to transmission errors or malfunctioning pixel elements in camera 
sensors. To remove the noise a total variation (TV) regularization is frequently con- 
sidered [TTJ [T2 [T<2 [201 HZ] that amounts to reconstruct a denoised version u of / 
as a minimiser of the generic functional 

J( u ) = \Du\(n) + X<p(u,f), (1.1) 

with 

\Du\(n)= sup uV-gdx (1.2) 

geC = >o (0;R 2 ),||g!| oo <l J Q. 

the total variation of u in fi, A a positive parameter and <f> a suitable distance function 
called the data fidelity term. The latter depends on the statistics of the data /, which 
can be either estimated or approximated by a noise model known from the physics 
behind the acquisition of /. For normally distributed /, i.e. the interferences in / are 
Gaussian noise, this distance function is the squared L 2 norm of u — f. If a Poisson 
noise distribution is present, <fi(u, f) — J n A [u — / log u) dx, which corresponds to the 
Kullback-Leibler distance between u and / [29j [33] . In the presence of impulse noise, 
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the correct data fidelity term turns out to be the L 1 norm of u— f [32j|2l]. Other noise 
models have been considered as well, cf. e.g. [2]. The size of the parameter A depends 
on the strength of the noise, i.e. it models the trade-off between regularisation and 
fidelity to the measured datum /. 

A key issue in total variation denoising is an adequate choice of the correct noise model, 
i.e. the choice of <fr, and of the size of the parameter A. Depending on this choice, 
different results are obtained. The term <f> is usually modelled from the physics behind 
the acquisition process. Several strategies, both heuristic and statistically grounded, 
have been considered for choosing the weight A, cf. e.g. (TTJ [32 [53J (2H [22J ES] • In this 
paper we propose an optimal control strategy for choosing both cj> and A. To do so we 



extend model ( 1.1 1 to a more general model, that allows for mixed noise distributions 
in the data. Namely, instead of ( |1.1[ ) we consider 

min (\Du\(Q)+J2 I \i<t>i{u,f)dx\. (1.3) 



=i 



where (pi, i = l,...,d, are convex differentiable functions in u, and Xi are posi- 
tive parameters. The functions <f>i model different choices of data fidelities. In the 
case of mixed Gaussian and impulse noise d — 2, (f>i(u, f) = \\u — f\\ 2 L 2^ and 
02 (w, /) = \\u — /||L 1 (f2)- The parameters Xi weight the different noise models <j>i 
and the regularising term against each other. As such, the choice of these parameters 
depends on the amount and strength of noise of different distributions in /. Typically, 
the Xi are chosen to be real parameters. However, in some applications, it may be 
more favourable to choose them to be spatially dependent functions Aj : £1 — > R + , cf. 
e.g. m [I M (23125] ■ 

We propose a PDE-constrained optimization approach to determine the weights Xi 
of the noise distribution and, in that manner, learn the noise distribution present in 
the measured datum / for both d = 1 and mixed noise models d > 1. To do so, we 



treat (1.3) as a constraint and state an optimization problem governed by (1.3) for 
the optimal determination of weights. When possible, we replace the optimization 
problem by a necessary and sufficient optimality condition (in form of a variational 
inequality (VI)) as a constraint. 

Schematically, we proceed in the following way: 

1. We consider a training set of pairs (/fc,u/c), k — 1,2, ... ,N. Here, fk's 
are noisy images, which have been measured with a fixed device with 
fixed settings, and the images Uk represent the ground truth or images 
that approximate the ground truth within a desirable tolerance. 

2. We determine the optimal choice of functions Ai by solving the following 
problem for k = 1, 2, . . . , JV 

d 

^n min i , W^- u k\\l^n)+P^Ui\\x, (1.4) 

Ai>0, i— l,...,a v ' * — ' 



where u solves the minimization problem (1.3) for a given f^, X corre- 
sponds to K in the case of scalar parameters or to, e.g., L 2 (f2) in the case 
of distributed functions, and < (3 -C 1 is a given weight. 
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The reasonability of assuming to have a such a training set is motivated by certain 
applications, where the accuracy and as such the noise level in the measurements can 
be tuned to a certain extent. In MRI or PET, for example, the accuracy of the mea- 
surements depends on the setup of the experiment, e.g., the acquisition time. Hence, 
such a training set can be provided by a series of measurements using phantoms. 
Then, the u^'s are measured with the maximal accuracy practically possible and the 
/fc's are measured within a usual clinical setup. For instance, dictionary based image 
reconstruction methods are already used in the medical imaging community. There, 
good quality measurements or template shapes are used as priors for reconstructing 
u, cf. e.g. [2B], or for image segmentation, cf. e.g. |Mj and references therein. 

Up to our knowledge this paper is the first one to approach the estimation of the 
noise distribution as an optimal control problem. By incorporating more than one fa 



into the model (1.3 1 our approach automatically chooses the correct one(s) through 



an optimal choice of the weights \ in terms of (1.4) 



Organisation of the paper:. We continue with the analysis of the optimization 
problem ( 2.4 )-( 2.4b) in Section [2] After proving existence of an optimal solution and 



convergence of the Huber-regularized minimisers to a minimiser of the original total 
variation problem, the optimization problem is transferred to a Hilbcrt space setting 
where the rest of our analysis takes place in Section [3] This further smoothing of the 
regularizer turns out to be necessary in order to prove continuity of the solution map 
in a strong enough topology and to verify convergence of our procedure. Moreover, 
differentiability of the regularized solution operator is thereafter proved, which leads 
to a first order optimality system characterization of the regularized minimisers. The 
paper ends with three detailed numerical experiments where the suitability of our 
approach is computationally verified. 

2. Optimization problem in BV(Vt). We are interested in the solution of the 
following bilevel optimization problem 

d 

min A g(u)+pY,\\\f x (2.1a) 

Xi>0, i= l,...,a z — ' 

subject to 



i=i 



u = argmin MeWrU ^J(u) = \Du\(Q) + YlJ A ^*( u ' /) j > ( 2 -lb) 

where the space X corresponds to K in the case of scalar parameters or to a function 
space such that X L 2 (il) (where stands for continuous injection) in the case of 
distributed functions, g : L 2 (ft) H> E is a C 1 functional to be minimised and /3 > 0. 
The admissible set of functions A is chosen according to the data fidelities fa. In 
particular, BV(il) n A restricts the set of BV functions on f2 to those for which the 
fa's are well defined, cf. examples below. Moreover, we assume that the functions fa 
are differentiable and convex in u, are bounded from below, and fulfil the following 
coercivity assumption 

[ fa(u,f) dx>C 1 \\u\\ p LP -C 2 , VueLP(Q)nA (2.2) 

for nonnegative constants C%, C2 and at least one p = 1 or p = 2. Examples of fa's 
that fulfill these assumptions and that are considered in the paper are 
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The Gaussian noise model, where f Q cf>(u,f) dx = \\u — f\\%2/Q\ fulfills the 
coercivity constraint for p = 2 and the admissible set A = L 2 (fl). 
The Poisson noise model, where <p(u, /) = u—f log u and A = {u € L 1 (ft) \ u > 
0}. This <f> is convex and diffcrcntiable and fulfils the coercivity condition for 
p = 1. More precisely, we have for u > 



(u- /logu) > |MUi(f2) - ||/|U»(n) • log||w|| L i (0) , 



where we have used Jensen's inequality, i.e., for u > 



log 



u dx \ > I log u dx. 



The impulse noise model, where < 
coercivity constraint for p = 1. 



(u,/) (ix = ||u - /|| L i ( o) fulfills the 



For the numerical solution of (1.3) we want to use derivative-based iterative methods. 



To do so, the gradient of the total variation denoising model has to be uniquely 



defined. That is, a minimiser of (1.3) is uniquely characterised by the solution of the 



corresponding Euler-Lagrange equation. Since the total variation regulariser is not 
diffcrcntiable but its "derivative" can be only characterised by a set of subgradients 
(the subdiffcrcntial) , we (from now on) shall use a regularised version of the total 
variation. More precisely, we consider for 7> 1 the Huber-type regularisation of the 
total variation with 



, , [iVd-i iflVd^ 

\vuU = V. L, 27 ,7 

if |V«| < i 



|v«| 2 i 



(2.3) 



and the following regularised version of ( 2.1 )-( 2.1b ) 



min g(u) + PV] \\Xi\\ 2 x 

0, %— L — * 



i=l 



(2.4a) 



subject to 



argmin^^i.ip,^ < J y (u) = / |Vu| 7 dx + V] / \i<j>i{u, /) > , (2.4b) 



where the space X, g, ^'s and /3 > are defined as before. The admissible set of 
functions A is assumed to be convex and closed subset of T4 7l,1 (fi) and is chosen 
according to the data fidelities <pi, cf. examples above. The existence of an optimal 
solution for ( 2.4b[ ) is proven by the method of relaxation. To do so we extend the 
definition of T 1 to BV(Cl) as 



j^{u) ue w 1 ' 1 (n)c\A 

+oo u g BV(n) \ {w 1 ' 1 n A) 



and prove the existence of a minimiser for the lower-semicontinuous envelope of J2 x t 
as follows. We have the following existence result. 
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Theorem 2.1. Let f G L 2 (f2) and Ai > /ixed. T/ien there exists a unique solution 
u G BV(f2) fli 0/ i/ie minimisation problem 



min ^relax( U )> 



where 



JrelaM = [ l V ^7 *X + C [ \D s u\+ V / Ai&(«, /) <fc. (2.5) 



is i/ie relaxed functional of J2 X ± on BV — w*. 
Remark 2.1. 7Vo£e that 

JlelaM < J e l t (u), UGBV(Q) 

and Jreiax( u ) = ^ext( u ) f or u G W ' (fi) fl .4. Moreover, the relaxation result from 
Theorem \2.1\ means that 



^relax( U ) = mf { n ™ inI ^ext ( U n ) ■ U n G BV (fl) , U„ -> U in W - j , 

i-e. Jreiax * s ^ e g^atest BV — w* lower semicontinuous functional less than or equal 



to X' x 



Proof. Let u„ be a minimising sequence for J^eiax- We start by stating the fact that 



is coercive and at most linear. That is 



For |x| > - : A\x\ — B < |xL = |a;| - — < \x\ + 1 
7 2 7 



For |a;| < - < 1 : A\x\ - B <= \xL = \x\ 2 ?- < \x\^. 

7 2 2 



Hence, 



\Du n \(n)= / \Vu n \dx + \ \D s u\< / |Vu| 7 cfe + c/ \D s u\ < M, Vra > 1. 
JO Jo Jo Jo 

Moreover, u n is uniformly bounded in L p (£l) for p = 1 or p — 2 because of the 
coercivity assumption ( |2.2[ ) on and therefore u n is uniformly bounded in BV(fl). 
Because BV(fl) can be compactly embedded in L 1 (J7) this gives that u n converges 
weak * to a function u in BV(fl) and (by passing to another subsequence) strongly 
converges in L 1 ^). From the convergence in L^fi), fl bounded, we get that u n (up 
to a subsequence) converges pointwise a.e. in O. Moreover, since 4>i is continuous, also 
4>i(. u n,f) converges pointwise to (f>i(u,f). Then, lower-semicontinuity of R(\Du\) = 
J n |Vu| 7 dx + C J n \D s u\ w.r.t. strong convergence in L 1 [T^ and Fatou's lemma 
together with pointwise convergence applied to J n 4>i(u n , f) dx gives that 

Jrelax(^)= [ |V«| 7 dx + C [ |£> flU | + W X *M U J) dx 

Jo Jo j=1 Jo 

< liminf J? elax {u n ) = liminf / | Vu n | 7 dx + C \D s u n \ + / \i<j)i(u n , f) dx J 



To see that the minimiser lies in the admissible set A it is enough to observe that 
the set A is a convex and closed subset of BV(£l) and hence it is weakly closed by 
Mazur's Theorem. This gives that u € A. To see that in fact J^ e i ax is the greatest 
lower-semicontinuous envelope of J^ xt see [HI [TJ HI H] ■ □ 

Theorem 2.2. There exists an optimal solution to 

d 

min a(«)+/?Ell A *H* ( 2 ' 6a ) 

i=l 

subject to 

u = ^gmm ueBV(n)nA J^ lax (u). (2.6b) 



Proof. Since the cost functional is bounded from below, there exists a minimizing 
sequence {A„} = {X n (u n )} C X d . Due to the Tikhonov term in the cost functional, 
we get that {A n } is bounded in X d . Let u n be a minimiser of J^ e i ax for a corresponding 
A„. Such a minimiser exists because of Theorem |2.1| Hence, 



^relax( Un ) — ^relax^) 

[ \Vu n \^dx + C [ \D s u n \+Y^( {\i) n <j>i{u n ,f) alx<Y^[ (Xi) n fa(0, /) dx 
Jn Jn i=1 Jn i=1 Jn 

As before, from the coercivity condition on / and the uniform bound on A„, we deduce 
that 

d p dp 

c\D Un m)<c\Du n m)+y* / E / ( A *)n<M°>/) dT 

<=i i=i 70 



<2(U|(A;)„|U -||-,(0./!| lL 
< C. 

Moreover, from the coercivity of fa in u n we get with a similar calculation that u„ is 
uniformly bounded in L p for p = 1 or 2, and hence in particular in L . In sum, u„ is 
uniformly bounded in BV(£l) and hence, converges weakly * in BV(fl) and strongly in 
L 1 (f2). The latter also gives pointwise convergence of u n and consequently fa[u n , f) 
a.e. and hence we have 

:= f |Vu| 7 dx + C [ \D s u\+S2 [ Kfa(uJ)dx 
Jn Jn i=x Jn 

< liminf I J |Vu„| 7 da; + C J |D s u„|+E y ( A i)„<M M ™,/) ctej 



i=l 



liminf J r 7 elm K,A n ) 



Since the cost functional is w.l.s.c, it follows, together with the fact that jA : A > 0} 



is weakly closed, that (X,u) e(ln {X t > 0} x (BV(Q) n A) is optimal for ( |2.4| ) 
□ 



Theorem 2.3. The sequence of junctionals J^ e i ax in (2.5) converges in the T- sense 
to the functional 

JreteW-| +oo ueBV(n)\(w^(n)nA) 

as 7 — > oo. Therefore, the unique minimiser of J^ elax converges to the unique min- 
imiser of Jreiax as 7 goes to infinity. 

Proof. The proof is a standard result that follows from the fact that a decreasing 
point wise converging sequence of functionals T- converges to the lower semicontinuous 
envelope of the point wise limit [HI Propostion 5.7]. In fact, J n |V«| 7 + ^ decreases 
in 7 and converges pointwise to J n |Vtt|. Then, for u € BV(£l) the functional J 1 
(being the lower-semicontinuous envelope of J2 x t) T- converges to the functional J n 



1 



in Theorem |2.3| The latter is the lower-semicontinous envelope of the functional in 



Although Theorem |2.3| provides a convergence result for the regularized TV subprob- 
lems, it is not sufficient to conclude convergence of the optimal regularized weights. 
For this we need the continuity of the solution map A — > u(X). Up to our knowl- 
edge, no sufficient continuity results for the control-to-state map in the case of a total 
variation minimiser as the state are known. There are various contributions in this 
directions [14] (3TJ |38j [39] which are - as they stand - not strong enough to prove the 



desired result in our case. Indeed, this is a matter of future research. 

3. Optimization problem in H^ft). In order to obtain continuity of the solu- 
tion map and, hence, convergence of the regularized optimal parameters, we proceed 
in an alternative way and move, from now on, to a Hilbcrt space setting. Specifically, 



we replace the minimisation problem ( 1.3 1 by the following elliptic-regularized version 
of it: 




Du\\ 2 L2 + \Du\(Q) + V f Xi cj>i(u) dx] . (3.1) 
<=i Jn J 



where < s <ti 1 is an artificial diffusion parameter. 



A necessary and sufficient optimality condition for (3.1) is given by the following 
elliptic variational inequality: 



e(Du, D{v — u))i,2 + / Xi<j)'i(u)(v — u) dx 
i=i Jn 

+ I \Dv\ dx - I \Du\ dx>0 for all v € H^(Q). (3.2) 

Note that by adding the coercive term, we implicitely impose the solution space Hq (Q) 
(see US]). 
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Our aim is to determine the optimal choice of parameters A^, i = 1, d, by solving 
the following optimization problem: 



- i=l 



(3.3a) 



subject to 



d „ 

e(Du, D(v — u)) L 2 + \_] / A,; <^(u)(v — it) ote 

+ / |Dt>| dx- \Du\ dx>0 for all v E #o(fi), (3.3b) 
Jn Jn 

where the space X corresponds to R in the case of scalar parameters or to a Hilbert 



function space in the case of distributed functions. Problem 3.3 corresponds to an 



optimization problem governed by a variational inequality of the second kind (see |17) 
and the references therein). 



Next, we perform the analysis of the optimization problem (3.3 1. After proving exis 



tence of an optimal solution, a regularization approach will be also proposed in this 
context. We will prove the continuity of the control-to-state map and, based on it, 
convergence of the regularized images and the optimal regularized parameters. In the 
case of a smoother regularization of the TV term, also differentiablity of the solution 
operator will be verified, which will lead us afterwards to a first order optimality 



system characterizing the optimal solution to (3.3) 
We start with the following existence theorem. 



Theorem 3.1. There exists an optimal solution for problem (3.3). 
Proof. Let {A„} C X be a minimizing sequence. Due to the T ikhonov term in the 



cost functional, we get that {A„} is bounded in X d . From (3.3b) we additionally get 
that the sequence of images {u n } satisfy 

d 

£ \\ u n\\ 2 H i + z] / A ni [4>'i{Un) - 4>'i{ti)]u n dx 
° i=l Jn 

+ [ \Dun\ dx < - V f \ ni 4>'Mu n dx, (3.4) 
Jn i=1 Jn 

which, due to the monotonicity of the operators on the left hand side, implies that 

a 



e\\u n f Hh <]T||A„Jx||^(0)|k,|l^lL„ 



(3.5) 



for 2 < p < +oo and r = Thanks to the embedding H^(Q) ^ L p (n), for all 

1 < P < +oo, we get that 



(3.6) 



1=1 



for some constant C > 0. Consequently, 

{«„} is uniformly bounded in iJg(f2). (3-7) 

Therefore, there exists a subsequence {(u n , A„)} which converges weakly in -ffg(O) x 
X d to a limit point (u, A). Moreover, u„ — > m strongly in L p (f2) and, thanks to the 
continuity of 0', also 

c/)'(u n )u n cj)'(u)u strongly in L 5 (17) (3.8) 



Consequently, thanks to the continuity of <fi' and the properties of a(-, •), we get 

a(u,u)+V / \ i (j)' l {u)u+ / \Du\ 
i=1 Jn Jn 

< liminf a(u n ,u n ) + V] / A ni ^(tt„)w„ + / |-Du n | 

< liminf a(u„, u) + V" / A n j^(u n )u+ / |£>u| 

i=1 Jo Jo 

= a(w,w) + V / A i ^(u)t>+ f \Dv\. 
i=1 Jn Jn 

Since the cost functional is w.l.s.c, it follo ws, t ogether with the fact that {A : A > 0} 
is weakly closed, that (A, u) is optimal for (3.3). □ 



Next, we consider the following family of regularized problems: 

d . 

e(Du 7 ,Dv) + (h 7 (Du-y),Dv)+Y] / A, ^-(u 7 > = 0,Vu € H^(Q) 

Jo 



(3.9) 



where h^(Du-y) corresponds to an active-inactive-set approximation of the subdiffcr- 
ential of |£hz 7 |, i.e., h 1 coincides with an element of the subdiffcrcntial up to a small 
neighborhood of 0. The most natural choice is the function 



h-y(z) 



72: 



max(l, j\z\) 



(3.10) 



which corresponds to the derivative of the Huber function defined in (2.3). An alter- 
native regularization is given by the C 1 function 

h i( z ) = \ R(5-i(ff-7kl + ^) 2 ) if <l\z\ <g+± (3.11) 

[l z if l\ z \ < g- ^, 

where g is a positive parameter. The latter smoothing for the total variation is going 
to be used in Proposition 3.4 where differentiability of the regulariser is needed. 



Remark 3.1. For a fixed A, it can be verified that (3.9) has a unique solution. 
Moreover, the sequence of regularized solutions {u 7 } converges strongly in Hq(Q) to 
the solution of Q (cf. |77f). 
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Based on the regularized problems (3.9), we now focus on the following optimization 
problem: 

d 

min fl(u)+0X;i|Ai|& (3.12a) 

Av>0, 1=1,. ...a * — * 
_ i=l 

subject to 

d „ 

e(Du 1 ,Dv) + (h 1 (Du 1 ),Dv)+y2 A, <^(it 7 )t; = 0,Vu G Hq(Q), (3.12b) 

where < e <C 1. In this setting we can prove the following continuity and convergence 
results. 

Proposition 3.2. Let {A„} be a sequence in X d such that X n — 1 A weakly in X d as 
n — > oo. Further, let u n := u 7 (A n ) denote the solution to (3.12b) associated with X n 
and u :— u 7 (A). TTiera 

it n — > u strongly in H (Q). 



Proof. Since A„ — »■ A weakly in X, it follows, by the principle of uniform boundedness, 
that {A„} is bounded in X. From (3.12b) we additionally get that 

d 



£ \\ u n\\ 2 H i + y~] / A ni [4>'i{u n ) - 4>i(0)) u n dx 
° i=i J " 

d „ 

+ (/i 7 (£>n„),£)ti n ) < - V / A„i ^(0)w« da?, (3.13) 



in 

which, proceeding as in the proof of Theorem |3.1[ implies that 

d 



K\\ H i<Cj2Uni\\xM(0)hr, 



(3.14) 



i=l 



for some constant C > 0. Hence, {u n } is uniformly bounded in Hq(£T). 
Consequently, there exists a subsequence (denoted the same) and a limit u such that 

u n — 1 u in _//q(57) and u n u in L p (f2), 1 < p < +oo. 



Thanks to t he s tructure of the regularized VI (3.12b) it also follows (as in the proof 
of Theorem 3.1 ) that u is solution of the regularized VI associated with A. Since the 



solution to (3.12b) is unique, it additionally follows that the whole sequence {u n } 
converges weakly towards ii. 

To verify strong convergence, we take the difference of the variational equations sat- 
isfied by u n and u and obtain that 

e(Du n — Du, Dv) + (h 7 (Du n ) — h 1 (Du), Dv) 

A (f)'{u) - X n (t>'(u n )] v dx,\Jv G H^(fl). 
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Adding the term — A„ <fi'(u) on both sides of the latter yields 



e(Du n — Du, Dv) + (h 7 (Du n ) — h 7 (Du), Dv) 

[A„ 4>'(u n ) — A n <j>'{u)\ v dx = A 4>'[u) — A„ 4>'(u) v dx,Vv € /fg(f2). 
si Jn L J 

Choosing v = u n — u and thanks to the monotonicity of the operator on the left hand 
side, we then obtain that 



e\\Du„ - Du\r < 



A <f> (u) — A n <f> (u) (u n — it) dx 



which thanks to the strong convergence u n — > u in L P (Q), 1 < p < +00, and the 
regularity <p'(u) £ implies the result. □ 



Theorem 3.3. There exists an optimal solution for each regularized problem (3.12|. 



Moreover, the sequence {A 7 } of regularized optimal parameters is bounded in X d and 
every weakly convergent subsequence converges towards an optimal solution of (3.3 1. 



Proof. Let {A n } be a minimizing sequence. From the structure of the cost functional 
and the properties of (3.12b I it follows that the sequence is bounded. Consequently, 
there exists a subsequence (den oted the same) and a limit A* such that A„ — A* 
weakly in X d . From Proposition 3.2 and the weakly lower semicontinuity of the cost 



functional, optimality of A* follows, and, therefore, existence of an optimal solution. 



Let now {A 7 } 7>0 be a sequence of optimal solutions to (3.12 1. Since (0,0) € Hq(Q) x 
X d is feasible for each 7 > 0, it follows that 

J(u 7 (A 7 ),A 7 ) < J(u 7 (0),0) = J(0,0). 

Thanks to the Tikhonov term in the cost functional it then follows that {A 7 } 7 >o is 
bounded. 

Let A be the limit point of a weakly convergent subsequence (also denoted by {A 7 }). 



From Remark 3.1 and Proposition 3.2 it follows, by using the triangle inequality, that 
u 7 (A 7 ) — > u strongly in Hq(H) as 7 — > 00, 



where u denotes the solution to (3.2) associated with A 



From the weakly lower semicontinuity of the cost functional, we finally get that 
J(n, A) < liminf J(u 7 (A 7 ), A 7 ) < liminf J(m 7 (A), A) = J(u, A), 

7— >oo 7— >oo 



where A is an optimal solution to (3.12 1. □ 



The next proposition is concerned with the differentiability of the solution opera- 



tor. This result will lead us thereafter (see Theorem 3.5) to get an expression for 
the gradient of the cost functional and also to obtain an optimality system for the 



characterization of the optimal solutions to (3.3). 

Proposition 3.4. Let G 1 : X d n- Hq(Q) be the solution operator, which assigns 



to each parameter A the corresponding solution to the regularized VI (3.9), with the 



function h 7 given by (3.11). Then the operator G 7 is Gateaux differentiable and its 
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derivative at X, in direction £ ; is given by the unique solution z G Hq(Q,) of the 
following linearized equation: 

e(Dz, Dv) + (h' 7 (Du)Dz, Dv) + V / Aj <f//(u) z v dx 

i=i Jn 

d „ 

+J2 / & w dx = °> ^ or °" u G ff o(°)- ( 3 - 15 ) 

„■_-, in 



Proof. Existence and uniqueness of a solution to (3.151 follows from Lax-Milgram 



theorem by making use of the monotonicity properties of h 7 and fy. 



Let £ € X d , and let y t and j/ be the unique solutions to (3.9) correspondent to A + 1£ 



and A, respectively. By taking the difference between both equations, it follows that 

e(D(u t — u), D{u t — u)) + {h 1 {Du t ) — h 7 (Du), D{u t — u)) 

+ X t (4>'(u t ) - 4>'{u),u t —u) = -t(£<j>'(u), u t - u), (3.16) 

which, by the monotonicity of /i 7 and <j)' yields that 

«||«f - uf Hh < t U\\xAW(u)\\ L r\\u t - u\\lp. (3.17) 

Therefore, the sequence {zt}t>0: with z t := Vt ^ v , is bounded and there exists a 
subsequence (denoted the same) such that zt — z weakly in Hq(Q). 

Using the mean value theorem in integral form we get that 

a(z t , w) + ^ (l h (Du t ) - h 7 (Du), Dw) + ~ (A t (0> t ) - 4>'{u)),w) (3.18) 
= a(z t ,w)+ [ (tiJ<& t )Dz t ,Dw) dx+ [ \ t 4>" {Ct)w dx (3.19) 



7 

n a 
= -(Z<j/(u),w), for all w G V, (3.20) 

where i)t(x) — Du{x) + pt(x){Dut(x) — Du(x)), with < pt(x) < 1, and (t(x) — 
u(x) + Qt{x){u t {x) — u(x)), with < Qt(x) < 1. 

From the continuity of the bilinear form it follows that a(zt,w) — ¥ a(z,w), for all 



w € V. Additionally, from the consistency of the regularization (see Remark 3.1), 
Ut — >■ u strongly in Hq(H) and, therefore, i9 f — > Du strongly in L 2 (f2) and Qt u 
strongly in Hl(Vt). 

Introducing the function 

!g tfi\x\>g + ^, 

g-i(g-i\A + ^? if \i\x\-g\ < ^, 

7M if i\A < g- ^, 

the regularizing function may be written as hj(x) — ^ X-y( x ) arL d the second term 
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in (3.191 can be expressed as 

(h^ t )Dz t ,D W ) ^ = /^)(^-^^^|,^) dx 
si 

+ J ^(# t )[Dw]^,Dzt) dx. 

Q 

be the operator defined by 

m ■■= x 7 (0|| - +x;(om^, (3.2i) 

with w G K 2 . When considered from L 9 (f2) to h q (il), $ is a continuous superposi- 
tion operator. Therefore, since dt — > Du strongly in h 2 (fl) and thanks to the weak 
convergence of z t and continuity of <fi" , we may pass to the limit in ( 3. 19 )-( 3.20 1 and 
obtain that 

a(z,w) + I (h'^DujDz , Dw) dx 

\<t>"{C,)w dx - I £<p'(u) w dx, for all w G #q(0). (3.22) 



Consequently, z G Hq(CI) corresponds to the unique solution of the linearized equa- 
tion. 



Using again the function the operator on the left hand side of equation (3.22 1 
may be written as 



/ Dw T M(x)Dz dx, 
Jn 



where 



M(x) := 2,1 + ^ll - *Z™ [Dy(x)Dy(x) T ] 

+ X57 ^ (5 ~ 7|jDy| + 2V [ D y( x ) D y( x f] +Xli \f^ [ D y( x ) D y( x ) T ] > (3-23) 

where xs~t an d Xz~> correspond to the indicator functions of the sets := {x *E fl : 
\l\Dy\ - g\ < ^} and I 1 := {i £ f! : y\Dy\ <g - i}, respectively 

Similarly, by replacing Dy with d t in (3.23) a matrix denoted by M t is obtained. Both 
matrices M and M t are symmetric and positive definite. Using Cholesky decomposi- 
tion we obtain lower triangular matrices L t and L such that 

M t (x) = L t {x) Lf(x) and M(x) = L{x) L T (x). 

Proceeding as in [101 pp. 30-31] (see also pTJ Thm. 6.1]), strong convergence of z t — > z 
in Hq(H.) is obtained, and, thus, also Gateaux differentiability of G 7 . □ 

Theo rem 3.5 (Optimality system). Let (X,u) be an optimal solution to problem 
(3.12) with X = R d . There exist Lagrange multipliers (p, /i) G Hq{SI) x R d such that 
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the following optimality system holds: 



d „ 

e(Du,Dv) + (h y (Du),Dv) + V / A, dx = 0,W G #o(ft), (3.24a) 



e(£>p, D«) + (ti JDu)*Dp, Dv) 



K p w = -(</(«), u),Vu € ^(n), (3.24b) 



/i, = 2/3A j; + / p^u) dx, i = l,...,d, 

Jn 



> 0, A l > 0, ^A,; = 0, i = 1, 



(3.24c) 
(3.24d) 



Proof. Consider the reduced cost functional 

f(\) = g(G y (\)) + m\l<- 



(3.25) 



Thanks to the optimality of A and the differentiability of both G 7 and g it follows 
that 

V/(A) T (C — A) > 0, for all f > 0. (3.26) 
Let p e H&(Cl) be the unique solution to the adjoint equation: 

e(Dp, Dv) + (h\(Du)*Dp, Dv) 

d „ 

+ J2 \ i tf(u)pvdx = -(g'(u) ) v) ) Vv€Hi(n). (3.27) 



Indeed, existence and uniqueness of a solution to (3.27) follows from the Lax-Milgram 
theorem, similarly as for the linearized equation. 

Using the adjoint equation it follows that 
V/(A) T e = ( 5 '(w),G;(sK) + 2/3A T ^ 

d „ 

= 2/3A T £ - e(Dp, Dv) - (h'JDu)*Dp, Dv) - V / A, p v dx, 



which, utilizing the linearized equation (3.15), yields that 

d 



V/(A) T £ = 2£A T £ + V I #(u)p£ 



(3.28) 



Let ^ := 2f3Xi + J Q <^(u)p, i — I, ...,d. From (3.26) and (3.28) it then follows that 

Ai T (£ - A) > 0, for all £ > 0, 
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which is equivalent to the complementarity system (3.24d). □ 



Remark 3.2. In the case of a general parameter Hilbert space X d , an optimality 



system constituted by equations (3.24a), ( 3.24b ] and the variational inequality 

2/3(A, £ - X) xd + (<j)'(u)p, £ - A) L 2 > 0, for all £ G X d : £ > a.e. 
is obtained. 

4. Numerical solution of the optimization problem. In this section we fo- 



cus on the numerical solution of the optimization problem ( 3.3 ). For the determination 
of the optimal parameter values we consider a projected BFGS (Broyden-Fletcher- 
Goldfarb-Shanno) method. In each computational experiment, the state equation is 
solved by means of a Newton type algorithm (specified in each case), while a for- 
ward finite differences quotient is used for the evaluation of the gradient of the cost 
functional. Due to the high computational cost of solving the state equation, a fixed 
line search parameter value a = 0.5 was considered in combination with the BFGS 
method. The linear systems in each Newton iteration are solved exactly by using a 
LU decomposition for band matrices. 

4.1. Gaussian noise. As a first example we consider the determination of a 
single regularization parameter, i.e., A £ 1. The optimization problem takes the 
following form: 

min - u \\ 2 L 2 + (i\ 2 (4.1a) 

subject to: 



e(Du, D(v — u))l2 + / X(u — u n )(v — u) dx 
Jn 

+ j \Dv\dx- [ \Du\ dx > 0,Vv G H%(Sl), (4.1b) 
Jn Jn 

where u Q and u n denote the original and noisy images respectively. The problem 
consists therefore in the optimal choice of the TV regularization parameter, if the 
original image is known in advance. This is a toy example for proof of concept only. 
In practice this image would be replaced by a training set of images as motivated in 
the Introduction. 

For the numerical solution of the regularized variational inequality we utilize the 
primal-dual algorithm developed in (2"5] , which was proved to be globally as well as 
locally superlinear convergent. 

The results for the parameter values j3 = 1 x 10 _10 ,e = 1 x 10~ 12 ,7 = 100 and 



h = 1/177 are shown in Figure 4.1 together with the noisy image distorted by Gaus- 
sian noise with zero mean and variance 0.002. The computed optimal parameter for 
this problem is A* = 2980 For a noise of mean and variance 0.02, and the same reg- 
ularization parameters as in the previous experiment, the optimal image is obtained 



with the weight A* = 1770.9. The noisy and denoised images are given in Figure 4.1 

By increasing the variance in the noise, the optimal values of the weight differ signifi- 
cantly. This is intuitively clear, since as the image becomes noisier there is less original 
information that can be directly obtained. When that happens, the TV regularization 
plays an increasingly important role. 
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Fig. 4.1. Noisy (left) and denoised (right) images. Noise variance: 0.002 (first row) and 0.02 
(last row). 



#pixels 


60 


65 


70 


75 


80 


85 


A* 


674.6 


742.9 


788.9 


855.0 


885.8 


933 



Table 4.1 

Optimal weight vs. mesh size; fi = le — 15, 7 = 100, = le — 10. 



The question of robustness of the optimal parameter value deserves also to be tested. 
In Table |4.1| we compute the optimal weight for different sources of the noisy image 
(different total number of pixels). The value of the optimal weight increases together 
with the size of the image from which the information is obtained. The variation 
remains however small, implying a robust behavior of the values. 

4.2. Magnetic resonance imaging. Gaussian noise images typically arise within 
the framework of magnetic resonance imaging (MM). The challenge in this case con- 
sists in training the machines so that a clearer image is obtained. The magnetic 
resonance images seem to be the natural choice for our methodology, since a training 
set of images is often at hand. 



For such a training set we consider the solution of problem (4.1). In Figure 4.2 
the noisy images together with the final optimized ones for a brain scan are shown. 
For this experiments a mesh step size of h = 1/250 was considered. The Tikhonov 
regularization parameter took the value ft — 10~ 10 , while the Huber regularization 
parameter was chosen as 7 = 100. With this values, the optimal parameter value for 
the MRI image with 3% of noise was A* = 64.1448. When the noise in the image was 
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Fig. 4.2. Image with 3% noise (upper left) and its correspondent optimal denoised one (upper 
right); noisy image with 9% noise (lower left) and optimal denoised image (lower right) 



of 9%, the computed optimal weight was A* = 26.7110. 

4.3. Training objective- multiple parameters (Gauss+Poisson). The im- 
portance of controlling the regularization in a denoising problem becomes clear once 
two different noise distributions, modeled by fidelity terms weighted by non-negative 
parameters Ai and A2, are present in an image. In particular, in the following exper- 
iment we shall solve the optimization problem 

! 2 

mm -\\u-u \\ 2 L2 +/3^||A,|| 2 

_ i=l 

subject to: 

min \^\\Du\\ 2 L2 + \Du\(Q) + ^-\\u - u n \\ 2 L2 + A 2 / (u - u n log u) dx \ . (4.2) 



For the characterization of a minimizer of (4.2 1 we (formally) get the following Euler- 
Lagrange equation 

+M(u- u n ) + A 2 (l -) - a = 

max(7| vu\, 1) / u 

a ■ u = 0, 

with non-negative Lagrange multiplier a £ L 2 (f2), cf. [27 . As in [33j we multiply the 
first equation with u and get 

eAu - div ( 7 ^ 1 T\ I + M u ~ «n) J + A 2 (m - u„) = 0, 

\max(7| Vu|, 1) / / 
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Fig. 4.3. SSN residuum in the last 4 iterations, fi = le — 4,7 = 50, (3 = le — 10, h = 1/60, u\ 
769.2199, u 2 = 30.6396 



where we have used the complementarity condition a ■ u = 0. Next, the solution u is 
computed iteratively by using a Newton type method. 

For an appropriate initial guess u , an iteration of the semismooth Newton method 
consists in solving the system 

8 U (— eAu — divq + Xi(u — u n )) + u (— eAS u — divS q + X\S U ) (4-3) 
+\28 u — —u (—eAu — divq + \i(u — u n )) — \2(u — u n ), 



~lV5 u 2 X7u T \7S u V« 7VM 

9 ~ max(l,7|Vu|) +X ^ 7 max(l, 7|Vu|) 2 ]Vu| _ + max(l, 7 | Vu|) ' 



for the increments S u and In equation (14. 41), stands for the indicator function 
of the active set A-y :={i€ll: 7|Vu(ir)| > l}. 

Similarly to [H], we consider a modification of the iteration based on the properties 



of the solution to (4.2 1. Specifically, noting that q = on the final active set and 



that \q\ < 1, we replace the term by max (i 9 7 |VM|) on tne ^ nan d side of the 
iteration system. The resulting iteration is then given by: 

S u (—eAu — divq + Xi(u — u n )) + u (—eA5 u — div8 q + XiS u ) (4-5) 
+A2<5« = -u (-eAu - divq + Xi(u - u n )) - \ 2 (u - u n ), 

6 _ l^Su 2 ^u T \J5 u q _ _ jVu 

q max(l,7|Vu|) T max(l, 7 | Vu|) 2 max(l, |g|) max(l, 7 | Vw|) ' 

(4.6) 

The resulting algorithm exhibits global and local superlinear convergence properties. 
In Figure [4~3| the residuum of the algorithm in the last 4 iterations is depicted. The pa- 
rameter values used are fi — le — 4, 7 = 50, = le — 10, h = 1 /60, Xi — 769.2199, A 2 = 
30.6396. From the behavior of the residdum, local superlinear convergence is inferred. 
In combination with the outer BFGS iteration, a competitive algorithm for the solu- 
tion of the bilevel problem is obtained. 

For the computational tests we consider the noisy zoomed image of a plane's wing 
(see Figure 4.4). Choosing the parameter values f3 = le — 10, 7 = 100 and e = le — 15, 
the optimal weights AJ = 1847.75 and A 2 = 73.45 were computed on a grid with mesh 



size step h — 1/200. From Figure 4.3 also a good match between the original and the 
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Fig. 4.4. Noisy (left), denoised (center) and original (right) images 



%Displacement 


-10 


-5 





5 


10 


% Sensitivity of A* 


15.56 


0.3 





0.15 


8.26 


K 


1270.9 


1103.4 


1100.7 


1099.8 


1189.8 


a; 


26.39 


48.24 


46.37 


44.92 


28.13 



Table 4.2 

Optimal weight sensitivity by moving the sample image along the diagonal; fi = le— 12, 7 = 50, 
P = le- 10 h= 1/100. 



denoised images can be observed. The noise appears to be succesfully removed with 
the computed optimal weights. 

Further, we tested the stability of the optimal parameter values with respect to 
changes in the source noisy image. This behavior is registered in Table |4.2| On 
the first data column the optimal weights corresponding to a zoomed image displaced 
by 10 grid points down and 10 grid points left is registered. The same kind of data is 
registered in the second column for a displacement of 5 grid points down and 5 left. 
In the fourth column the data corresponds to a displacement in the zoom by 5 grid 
points right and 5 up. Similarly, in the fifth column for 10 grid points. From Table 
|4.2| a robust behavior of the parameter values can be inferred. Indeed, by changing 
the source image by 10%, the optimal weights change less than 16%. 

4.4. Impulse noise. For the last experiment we consider images with so-called 
impulse noise. Specifically, we aim to solve the following parameter estimation prob- 
lem: 

min \\\u-u \\ 2 L2 +(i\ 2 (4.7) 

subject to: 

e(Du, D(v — u))l2 + A / \v — u n \ dx — A / \u — u n \ dx 



[ \Dv\ dx - [ \Du\ dx > 0,Vv G H£(Sl). (4.8) 



Equation (4.8) corresponds to the necessary and sufficient optimality condition for 



the optimization problem: 

min 1 1 1| Du\\ % + \Du\(n) + X J \u-u n \ dxX . (4.9) 
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The L^norm is introduced to deal with the sparse impulse noise in the image. The 
presence of this norm adds, however, an additional nondifferentiability to the opti- 
mization problem. 

For the numerical solution of the lower level problem we consider a Huber type reg- 
ularization of both the TV term and the I/ 1 -norm. Using a common regularization 
parameter 7, the resulting nonlinear PDE takes the following form: 

_ eAu _ div ( 7 V " ) + A ] {u ~ Un) , = 0, (4.10) 
\max(7|Vu|, 1) / max(l, 7|u — u n \) 



or, in primal-dual form, 



-eAu - div q + X p = 0, (4.11) 

q= max( 7 |Vu|,l)' (4 ' 12) 

p= ] {U T Un) „ ■ (4.13) 
max(l,7|-u - u n \) 



The nonlinearities in equation (4.10) are present both in the quasilinear and the 
semilinear terms. Both of them have to be be carefully treated in order to obtain a 
convergent numerical method for the solution. 

Proceeding in a similar manner as in Section 5.2, a semismooth Newton iteration for 
the impulse noise lower level problem is given by 

-eA5 u - divS q +\S p =- (-eAu - divq + Xp) , (4.14) 
7W» j 2 Wu t WS u Vu _ jWu 

q max(l, 7 |Vu|) X - 4 "'max(l,7|Vzi|) 2 |Vu| 9 max(l,7|Vw|) ' K ' ' 

5 J$u l 2 {u-u n )5 u (u-u n ) 

p max(l,7|tt - u n \) 7 max(l, j\u - u n \) 2 \u - u n \ 

7(«"«n) (416) 



max(l, j\u - u n \) 

Using a similar argumentation as for the Gauss+Poisson noise case, we consider the 
modified system 

-eA5 u - divS q + XS P = - (-eAu - divq + Xp) , (4-17) 

s _ 7^^ , 2 Vu T V<S„ g 

q max(l,7|Vu|) 7 max(l, 7I Vw|) 2 max(l, |g|) 

= -9+ ^V^, (4-18) 

max(l, 7| Vm|) 



J$u Y(u-u n )S u p 



p max(l, j\u — u n \) ' ^~ Sl max(l, j\u — u n \) 2 max(l, \p\) 

= -p+ tT n) iv ^ 

max(l,7|u - u n \) 

where we replaced the terms t^t and \ u ~ Uti } on the left hand side by % , and 

maxfi.lpl) ' respectively. 
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Iteration 


A* 


Cost functional 


Residuum 


#SSN iterations 


1 


10.0015 


0.0124 


0.0015 


15 


2 


16.7752 


0.0124 


0.0015 


3 


3 


19.2223 


0.0064 


4.024e-4 


14 


4 


10.0854 


0.0048 


5.496e-4 


12 


5 


24.9562 


0.0123 


0.0014 


17 


6 


26.2300 


0.0018 


1.124e-4 


18 


7 


30.2286 


0.0017 


8.530e-5 


10 


8 


45.8756 


0.0013 


6.794e-5 


12 


9 


48.7340 


7.83e-4 


1.049e-5 


13 


10 


73.2269 


7.55e-4 


9.397e-6 


7 


11 


57.9833 


8.12&-4 


1.54e-5 


12 


12 


58.2922 


6.81e-4 


3.20e-7 
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Table 4.3 

Optimal weight sensitivity by moving the sample image along the diagonal; fj,= le- 
le- 10 h = 1/40. 



12, 7 = 50, 




Fig. 4.5. Noised (left), de-noised (center) and 
1/200 



(right) images, 7 = 50, e = le— 12, h ■■ 



The behavior of the resulting BFGS-SSN algortihm is registered in Table |4.3| For 
the parameter values fj, = le — 12, 7 = 50, j3 = le — 10 h = 1/40 the algorithm 
takes 12 iterations to converge. The number of iterations of the lower level algorithm, 



given through (4.17)- (4.19), is registered in the last column, from which the fast 



convergence of the method is experimentally verified. 
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